National Capital District
Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models
Simbeck, Katharina, Mahran, Mariam
Despite growing research on bias in large language models (LLMs), most work has focused on gender and race, with little attention to religious identity. This paper explores how religion is internally represented in LLMs and how it intersects with concepts of violence and geography. Using mechanistic interpretability and Sparse Autoencoders (SAEs) via the Neuronpedia API, we analyze latent feature activations across five models. We measure overlap between religion- and violence-related prompts and probe semantic patterns in activation contexts. While all five religions show comparable internal cohesion, Islam is more frequently linked to features associated with violent language. In contrast, geographic associations largely reflect real-world religious demographics, revealing how models embed both factual distributions and cultural stereotypes. These findings highlight the value of structural analysis in auditing not just outputs but also internal representations that shape model behavior.
- North America > United States > New York > New York County > New York City (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.14)
- (225 more...)
On Language Models' Sensitivity to Suspicious Coincidences
Padmanabhan, Sriram, Misra, Kanishka, Mahowald, Kyle, Choi, Eunsol
Humans are sensitive to suspicious coincidences when generalizing inductively over data, as they make assumptions as to how the data was sampled. This results in smaller, more specific hypotheses being favored over more general ones. For instance, when provided the set {Austin, Dallas, Houston}, one is more likely to think that this is sampled from "Texas Cities" over "US Cities" even though both are compatible. Suspicious coincidence is strongly connected to pragmatic reasoning, and can serve as a testbed to analyze systems on their sensitivity towards the communicative goals of the task (i.e., figuring out the true category underlying the data). In this paper, we analyze whether suspicious coincidence effects are reflected in language models' (LMs) behavior. We do so in the context of two domains: 1) the number game, where humans made judgments of whether a number (e.g., 4) fits a list of given numbers (e.g., 16, 32, 2); and 2) by extending the number game setup to prominent cities. For both domains, the data is compatible with multiple hypotheses and we study which hypothesis is most consistent with the models' behavior. On analyzing five models, we do not find strong evidence for suspicious coincidences in LMs' zero-shot behavior. However, when provided access to the hypotheses space via chain-of-thought or explicit prompting, LMs start to show an effect resembling suspicious coincidences, sometimes even showing effects consistent with humans. Our study suggests that inductive reasoning behavior in LMs can be enhanced with explicit access to the hypothesis landscape.
- North America > United States > Texas (0.24)
- Asia > Indonesia > Java > Jakarta > Jakarta (0.05)
- South America > Brazil > São Paulo (0.04)
- (34 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)
Neural Combinatorial Optimization for Real-World Routing
Son, Jiwoo, Zhao, Zhikai, Berto, Federico, Hua, Chuanbo, Kwon, Changhyun, Park, Jinkyoo
Vehicle Routing Problems (VRPs) are a class of NP-hard problems ubiquitous in several real-world logistics scenarios that pose significant challenges for optimization. Neural Combinatorial Optimization (NCO) has emerged as a promising alternative to classical approaches, as it can learn fast heuristics to solve VRPs. However, most research works in NCO for VRPs focus on simplified settings, which do not account for asymmetric distances and travel durations that cannot be derived by simple Euclidean distances and unrealistic data distributions, hindering real-world deployment. This work introduces RRNCO (Real Routing NCO) to bridge the gap of NCO between synthetic and real-world VRPs in the critical aspects of both data and modeling. First, we introduce a new, openly available dataset with real-world data containing a diverse dataset of locations, distances, and duration matrices from 100 cities, considering realistic settings with actual routing distances and durations obtained from Open Source Routing Machine (OSRM). Second, we propose a novel approach that efficiently processes both node and edge features through contextual gating, enabling the construction of more informed node embedding, and we finally incorporate an Adaptation Attention Free Module (AAFM) with neural adaptive bias mechanisms that effectively integrates not only distance matrices but also angular relationships between nodes, allowing our model to capture rich structural information. RRNCO achieves state-of-the-art results in real-world VRPs among NCO methods. We make our dataset and code publicly available at https://github.com/ai4co/real-routing-nco.
- Asia > East Asia (0.05)
- Europe > Northern Europe (0.05)
- Asia > Southeast Asia (0.05)
- (80 more...)
Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements
Yang, Shu, Zhu, Shenzhe, Wu, Zeyu, Wang, Keyu, Yao, Junchi, Wu, Junchao, Hu, Lijie, Li, Mengdi, Wong, Derek F., Wang, Di
We introduce Fraud-R1, a benchmark designed to evaluate LLMs' ability to defend against internet fraud and phishing in dynamic, real-world scenarios. Fraud-R1 comprises 8,564 fraud cases sourced from phishing scams, fake job postings, social media, and news, categorized into 5 major fraud types. Unlike previous benchmarks, Fraud-R1 introduces a multi-round evaluation pipeline to assess LLMs' resistance to fraud at different stages, including credibility building, urgency creation, and emotional manipulation. Furthermore, we evaluate 15 LLMs under two settings: 1. Helpful-Assistant, where the LLM provides general decision-making assistance, and 2. Role-play, where the model assumes a specific persona, widely used in real-world agent-based interactions. Our evaluation reveals the significant challenges in defending against fraud and phishing inducement, especially in role-play settings and fake job postings. Additionally, we observe a substantial performance gap between Chinese and English, underscoring the need for improved multilingual fraud detection capabilities.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > China > Beijing > Beijing (0.05)
- (20 more...)
- Law Enforcement & Public Safety > Fraud (1.00)
- Information Technology > Security & Privacy (1.00)
Back to School: Translation Using Grammar Books
Hus, Jonathan, Anastasopoulos, Antonios
Machine translation systems for high resource languages perform exceptionally well and produce high quality translations. Unfortunately, the vast majority of languages are not considered high resource and lack the quantity of parallel sentences needed to train such systems. These under-represented languages are not without resources, however, and bilingual dictionaries and grammar books are available as linguistic reference material. With current large language models (LLMs) supporting near book-length contexts, we can begin to use the available material to ensure advancements are shared among all of the world's languages. In this paper, we demonstrate incorporating grammar books in the prompt of GPT-4 to improve machine translation and evaluate the performance on 16 topologically diverse low-resource languages, using a combination of reference material to show that the machine translation performance of LLMs can be improved using this method.
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
- Oceania > Solomon Islands (0.04)
- Oceania > Papua New Guinea > Central Province > National Capital District > Port Moresby (0.04)
- (19 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Hitting the Books: How one of our first 'smart' weapons helped stop the Nazis
At the outset of World War II, you'd have a better chance of finding a needle in a haystack with a camel stuck in its eye than you did shooting down an enemy aircraft in your first dozen or so shots. This is because anti-aircraft shells at the time used manual fuses that had to be dialed in for specific lengths of time to delay their explosion. The idea was that you'd estimate where the targeted plane would be in, say five seconds, based on its currently flight path, then time the shell for that length, fire the shell at the plane and hope that the timing and location were close enough that shrapnel from the exploding shell hits the plane. If your calculations were off by even a hair, the shell would miss by thousands of feet. And if shooting down piloted aircraft was this hard, intercepting Germany's terrifyingly fast V1 and V2 rockets required far more luck than skill. But that's exactly what the team at Section T set out to do.
- Europe > Germany (0.24)
- Pacific Ocean > South Pacific Ocean > Coral Sea (0.05)
- Oceania > Papua New Guinea > Central Province > National Capital District > Port Moresby (0.05)
- (6 more...)